ColLex.en: Automatically Generating and Evaluating a Full-form Lexicon for English

نویسندگان

  • Tim vor der Brück
  • Alexander Mehler
  • Zahurul Islam
چکیده

The paper describes a procedure for the automatic generation of a large full-form lexicon of English. We put emphasis on two statistical methods to lexicon extension and adjustment: in terms of a letter-based HMM and in terms of a detector of spelling variants and misspellings. The resulting resource, ColLex.EN, is evaluated with respect to two tasks: text categorization and lexical coverage by example of the SUSANNE corpus and the Open ANC.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Generation of Multilingual Lexicon by Using Wordnet

A lexicon is the heart of any language processing system. Accurate words with grammatical and semantic attributes are essential or highly desirable for any applicationbe it machine translation, information extraction, various forms of tagging or text mining. However, good quality lexicons are difficult to construct requiring enormous amount of time and manpower. In this paper, we present a meth...

متن کامل

Developing and Evaluating a Searchable Swedish-Thai Lexicon

We present an automatically created Swedish-Thai lexicon. The lexicon was created by matching the English translations in a Thai-English and a Swedish-English lexicon. The search interface to the lexicon includes several NLP tools to help the target group: second language learners of Swedish. These include automatic generation of inflectional forms of words, automatic spelling correction, lemma...

متن کامل

-1 - Machine Translation without a Bilingual Dictionary

This paper outlines experiments conducted to determine the contribution of the traditional bilingual dictionary in the automatic alignment process to learn translation patterns, and at runtime. We found that by using automatically derived translation word pairs combined with a function word only lexicon, we were able to either match or nearly match the translation quality of the system that use...

متن کامل

Multilingual Aliasing for Auto-Generating Proposition Banks

Semantic Role Labeling (SRL) is the task of identifying the predicate-argument structure in sentences with semantic frame and role labels. For the English language, the Proposition Bank provides both a lexicon of all possible semantic frames and large amounts of labeled training data. In order to expand SRL beyond English, previous work investigated automatic approaches based on parallel corpor...

متن کامل

Automatic Lexicon Generation through WordNet

A lexicon is the heart of any language processing system. Accurate words with grammatical and semantic attributes are essential or highly desirable for any application – be it machine translation, information extraction, various forms of tagging or text mining. However, good quality lexicons are difficult to construct requiring enormous amount of time and manpower. In this paper, we present a m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014